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Let = Y.k ^^'I'iy Eilo ' Efe 0ik2'^^H2'y- k) be the 

linear wavelet density estimator, where 0, tp are a father and a mother 
wavelet (with compact support), at, Pik are the empirical wavelet 
coefficients based on an i.i.d. sample of random variables distributed 
according to a density po on R, and jn £ Z, j„ oo. Several uniform 
limit theorems are proved: First, the almost sure rate of convergence 
of supj^gg \Pn{y) — Epn{y)\ is obtained, and a law of the logarithm for 
a suitably scaled version of this quantity is established. This implies 
that supygp|p„(y) — Po{y)\ attains the optimal almost sure rate of 
convergence for estimating po, if jn is suitably chosen. Second, a 
uniform central limit theorem as well as strong invariance principles 
for the distribution function of p„, that is, for the stochastic processes 
y/n{F^ {s) — F{s)) = Jl^iPn ~ Po), s G R, are proved; and more 
generally, uniform central limit theorems for the processes y'n J (p„ — 
Po)/, f & ^, for other Donsker classes ^ of interest are considered. 
As a statistical application, it is shown that essentially the same limit 
theorems can be obtained for the hard thresholding wavelet estimator 
introduced by Donoho et al. [Ann. Statist. 24 (1996) 508-539]. 

1. Introduction. Let X, Xi , . . . , X„ be independent identically distributed 
real-valued random variables with absolutely continuous law P and density 
Po, and denote by P„ the usual empirical measure induced by the sample. If 
is a bounded and compactly supported father wavelet (scaling function) 
and ijj an associated mother wavelet, the (linear) wavelet density estimator 
of Po is given by 
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(1) 



:'/2^(2'y - A;) 



fcez 1=0 fegz 



where ajfc = / 2''/^(j){2^x - k) dPnix), (3ik = J 2'/2^(2'x - /c) and where 

jn oo. This estimator was introduced in Doukhan and Leon (1990) and 
Kerkyacharian and Picard (1992). The latter authors proved — using wavelet 
theory as established by Daubechies (1992), Meyer (1992) and others — that 
this estimator is, for a suitable choice of jn, an optimal estimator of po in 
loss, 1 < p < cxD, if po belongs to a Besov space i?pg(]R). Furthermore, "non- 
linear" modifications of pn were shown to be optimal even in more general 
settings, including, in particular, the case when t is unknown [see Donoho, 
Johnstone, Kerkyacharian and Picard (1995, 1996), Delyon and Juditsky 
(1996), Kerkyacharian, Picard and Tribouley (1996), Hall, Kerkyacharian 
and Picard (1998), Juditsky and Lambert-Lacroix (2004) and others]. The 
linear estimator is part of the analysis of these more complex nonlinear es- 
timators. We refer to the monographs Hardle, Kerkyacharian, Picard and 
Tsybakov (1998) and Vidakovic (1999) for a general treatment of the use of 
wavelets in statistics. 

In this article, we have three main goals: the first two consist in studying 
the limiting behavior of the linear estimator Pniv) both as an estimator for 
the true density function po{y) and as an estimator (s) = Jl^Pniv) dy 
for the true distribution function F{s) = Jl^Poiu) dy, in sup-norm loss. 
Third — as a statistical application — we consider the same problems for a 
nonlinear modification ofpn, namely the "hard thresholding" wavelet density 
estimator. 

In the first case, we show that under mild conditions. 



in fact we obtain an exact law of the logarithm for a suitably scaled version 
of Pn — Epn, somewhat analogous to that of Deheuvels (2000) and Gine 
and Guillou (2002) for the Rosenblatt-Parzen kernel density estimator. A 
corollary of the proof also recovers, under weaker conditions, a result of 
Massiani (2003), where the supremum is taken over a bounded interval, as 
in the classical law of the logarithm of Stute (1982) for the Rosenblatt- 
Parzen estimator. The result (2) implies that, if pq is in the Besov space 
-^oooo(l^) (o^ in corresponding Holder space of order t), then 




(3) 
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if one chooses 2-^" ~ (n/ logn)^/*^^*"*"^) , which is the optimal rate of conver- 
gence in sup-norm loss. These results are complemented by expectation 
bounds and convergence of Laplace transforms. 

In the second case, we show, for jn as in the previous paragraph (and 
other choices), that the processes 

(4) ^{F^-F){s), s€R, 

converge in law in the Banach space of bounded functions on R to the P- 
Brownian bridge process, and that 

in fact, we obtain an exact law of the iterated logarithm and a strong ap- 
proximation result. More generally, we then also prove uniform central limit 
theorems for the processes 

Vn / (Pniy) - poiy))f{y) dy, f ^T, 

JR 

for several (Donsker) classes of functions J-. These results again parallel 
limit theorems for the classical Rosenblatt-Parzen estimator [see Bickel and 
Ritov (2003), Gine and Nickl (2008, 2009)]. 

To motivate the relevance of our third goal, note that the resolution jn un- 
der which the linear estimator achieves the optimal rate (3) for po £ B^^{W) 
depends on t, which is typically unknown. To remedy this, Donoho et al. 
(1996) introduced (soft and hard) thresholding wavelet estimators: one first 
chooses jn sufficiently large and independent of t, and then deletes the 
wavelet coefficients Pi^ in (1) in a certain range of Vs if they are smaller than 
a certain threshold. This estimator does not depend on t anymore, but still 
achieves rates of convergence in the £^-loss, 1 < p < oo, that are optimal up 
to a logarithm factor, uniformly over compactly supported densities that are 
contained in balls of Besov spaces i?pg(M), with t unknown (but bounded). 
We show, as an application of our results for the linear estimator, that their 
hard thresholding estimator is exact rate adaptive in the supnorm, that is, 
it achieves the optimal rate (2) in the sup-norm, even without a logarithmic 
penalty, for (not necessarily compactly supported) po in B^^^{M), and any 
unspecified (but bounded) t. (In fact, this implies optimality over balls of 
densities in Bpg{M.) as well, cf. Remark 8 below.) Moreover, we prove that the 
hard thresholding wavelet density estimator also satisfies the central limit 
theorem (4). Hence this remarkable estimator is not only rate-adaptive in 
sup-norm loss, but also satisfies Bickel and Ritov's (2003) plug-in property. 

The linear estimator in (1) can be expressed as a generalized kernel-type 
estimator 

pn(y) = -E^(2'^-2^y)> 

1=1 
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where K{x,y) is the wavelet projection kernel. It is interesting to compare 
to other kernel choices. The classical case would be the Parzen-Rosenblatt 
kernel density estimator, where K{x,y) = K[x — y) with K some probability 
density: if one makes the usual conversion from bandwidth h to , one can 
compare directly with the classical kernel case, and we discuss this in some 
detail in Remark 6 below. In a nutshell, while the proof in the wavelet case 
follows a pattern similar to the one for classical kernels, some fundamental 
difficulties arise due to the fact that K{x,y) is not a "convolution- type" 
kernel K{x — y). Most importantly, the size of the random fluctuations of 
the (centered) wavelet estimator Pn{y) — Epn{y) depends on y not only 
through po{y), but also through the quantity / K'^{2^ y,u) du, which is part 
of the variance term, and which has periodic oscillations on R (unless one 
restricts oneself to the Haar wavelet). Among other things, this requires 
a normalization in the law of the logarithm that is not necessary in the 
convolution-kernel case of Stute (1982) and Gine and Guillou (2002). One 
might also be interested in considering projection kernels associated with 
other orthonormal systems that are not of wavelet type, as, for example, 
the Dirichlet kernel (which corresponds to an estimator based on Fourier 
series expansions). While our techniques may apply there as well, these 
kernels are often less interesting for estimating a function in the sup-norm, 
because of approximation-theoretic reasons: for example, the Fourier series 
of a uniformly continuous function might not converge at all points, and 
even if it does, its approximation properties in supnorm can be suboptimal. 

Our proofs are based on techniques from empirical process theory. Note 
that if po is compactly supported, or if y varies in a flxed compact set, 
then Pn{y) — Epn{y) consists of a finite sum of centered empirical wavelet 
coefficients, and in this case "finite dimensional" probabilistic methods are 
sufficient to analyze the limiting behavior of pn in the sup-norm. Otherwise, 
empirical process methods seem to be required. We show that the classes of 
functions naturally associated to wavelet density estimators are of Vapnik- 
Cervonenkis type, and this allows the effective use of exponential inequal- 
ities for empirical processes [Talagrand (1996)] and entropy-based moment 
bounds [e.g., see Einmahl and Mason (2000), Gine and Guillou (2001)]. We 
also use that bounded subsets of Besov spaces are P-Donsker classes of func- 
tions [Nickl and Potscher (2007)]. Wavelet theory is used throughout, and a 
brief summary of what we need precedes the main results. 

2. Basic setup. 

2.1. Notation. For an arbitrary (nonempty) set M, i°°{M) will denote 
the Banach space of bounded real-valued functions H on M normed by 
ll-f^lU/ sup^gj;/ I i7(m) I, but we will use the symbol ||/i||oo to denote 
supjjgjg |/i(2;)| for /i:]R^M. For Borel-measurable functions /i:]R^]R and 
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Borel measures fi on M, we set fj,h := J^hdfi, and we denote by 0'{^) := 
the usual Lebesgue-spaces of real-valued functions, normed by 
II • llp^^. If dfi{x) = dx is Lebesgue measure, we set £^(]R) := £p(M, /Lt), and, for 
1 < p < oo, we abbreviate the norm by || • ||p. Similarly HP := P'{'L), 1 < p < oo, 
are the usual sequence spaces, and we also denote their norm by || • ||p in 
a slight abuse of notation. All integrals are over the real line unless stated 
otherwise. 

Let X\, . . . ,Xn be i.i.d. random variables with common law P on M, and 
denote by Pn = ^27=1 the empirical measure. We assume throughout 
that the variables Xi are the coordinate projections of (M^,S^,P^), and we 
set Pr := P^ . The empirical process indexed by C P) is given by / i— > 

^/n{Pn — P)f,f Convergence in law of random elements in i°°{J^) is 
defined as, for example, in Dudley (1999) or de la Pefia and Gine (1999), and 
will be denoted by the symbol -^^oo^jr) . The class J- is said to be P-Donsker if 
the centered Gaussian process Gp with covariance EGp{f)Gp{g) = P[{f — 
Pf)(g — Pg)] is sample-bounded and sample-continuous w.r.t. the covariance 
semimetric, and if ^/n{Pn — P) ~^£oo(j^) Gp. 

2.2. Wavelet expansions and estimators. We recall here some standard 
facts from wavelet theory [e.g., see Meyer (1992), Daubechies (1992), Hardle 
et al. (1998) or Vidakovic (1999)]. Let G C^{M.) be a father wavelet, that 
is, (j) is such that {(/)(• — fe) : /c £ Z} is an orthonormal system in £^(]R), and 
moreover the linear spaces Vq = {f{x) = Y,k Ck(ti{x - k) : {ck]k& S f'], Vi = 
{h{x) = f{2x) : / G Vol, • • • , '^S- = {K^) = fC^'^) : / e V^o}, • • • , are nested 
{Vj-i C Vj for j E N) and such that Uj>o^' is dense in £^(M). For (p with 
compact support and 

(5) K{y, x) := K^{y, x) = ^ (p{y - k)(l){x - k), 

the functions Kj{y, x) := 2^ K{2^y, 2^x),j G NU {0}, are the kernels of the or- 
thogonal projections of £^(M) onto Vj, and we write Kj{f){y) = J Kj{y, x) x 
f{x)dx for this projection. We will use the following properties repeatedly 
throughout the proofs: if (p (not necessarily a father wavelet) is bounded and 
compactly supported, we have [e.g., Hardle et al. (1998), Lemma 8.6] 

(6) \K{y,x)\<^y-x) and ^ |0(- - A;)| G /:°°(M), 

k 

where ^rM^M"*" is bounded, compactly supported and symmetric. Fur- 
thermore, if (/> is a bounded and compactly supported father wavelet, then, 
for every x, 



(7) 



K{x,y)dy = l 



6 



E. GINE AND R. NICKL 



[see Corollary 8.1 in Hardle et al. (1998)]; moreover, for / G £^(M), l<p< 
oo, and fixed j, the series 

Kj{f){y) = E 2''<^(2^?/ -k) f 4>{2'x - k)f{x) dx, yew, 

converges pointwise (since for each y this is a finite sum). For / G /^'^(M), 
which is the main case in this article, the convergence of the series in fact 
takes place in £^(M), 1 < p < oo. [For the reader's convenience, here is a 
proof: since j is fixed, we can assume j = 0. Setting ak = J 4>{x — k)f{x)dx 
we have / KQ{f){x)<j){x — k)dx = by compactness of the support of (f) and 
orthogonality, hence 

E [afcl < 1^ \KoU){x)cl>{x -k)\dx< snpJ2 IH^ - A:)|||i^o(/)||i 

(8) 



<Cl||$*|/|||l<C2||j||l 

by (6). Therefore, for any l<p<oo,J2k l|afc'^(- - A;)||p < jafcl < oo.] 

If now is a father wavelet and ip the associated mother wavelet so that 
{^(- -fc), 2^/2^(2' (•) -k):k€Z,lG Nu{0}} is an orthonormal basis of ^^(M) 
[see, e.g., Hardle et al. (1998), page 27], then any / G £P(M) has the formal 
expansion 

oo 

(9) f{y) = E Mmy - fc) + E E Pikimkiy), 

k 1=0 k 

where i^ik{y) = 2'/2^(2'y - k), ak{f) = J /{x^x - k) dx, Afc(/) = / f{x) x 
^ik{x)dx. Since (J^/+i - K;)/ = Efc Afc(/)^Zfc [e.g., Hardle et al. (1998), 
page 92], the partial sums of the series (9) are in fact given by 

i-i 

(10) K^{f){y)=Y.^k{my-k) + Y,Y.M)i^ik{y) 

k 1=0 k 

and — ^just as in the previous paragraph — one shows that, if (j), ip are bounded 
and have compact support, then (10) converges pointwise and also in £'P(M), 
1 < p < oo, if / G £^(]R). If p < oo, and / G £^(R), then convergence in (10) 
takes place in £P(M) by a similar argument. Now using (6), (7), Minkowski's 
inequality for integrals and continuity of translations in £^(]R), we have 
\\K,{f) - /lip < / ^{u)\\f{2-iu + .) - /lip dtx ^ as J ^ oo for ah / G £P(M), 
1 < p < oo, so that convergence of the wavelet series in (9) takes place in 
£f(M). 

Some regularity conditions on the wavelets (j), ip will be needed. They par- 
allel the order and moment conditions for convolution kernels in classical 
kernel density estimation. The standard conditions read as follows. Recall 
that D(j) is the weak derivative of if / (t)Df = — J {D(j))f holds for all com- 
pactly supported infinitely differentiable functions / : M — > M. 
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Condition 1. (S). We say that the orthonormal system — A;), V^^/c : G 
Z, / G NU {0}} is S'-regular, if cp and ^p are bounded and have compact sup- 
port, and, if in addition, one of the following two conditions is satisfied: 
either (i) the father wavelet (p has weak derivatives up to order S that are 
in for some 1 < p < oo; or (ii) the mother wavelet ip associated to (j) 

satisfies / x^'ip{x) dx = 0, i = 0, . . . ,S. 

The Haar wavelets, corresponding to cp = l(o,i] and ip = l(o,i/2] ~ l(i/2,i]) 
satisfy this condition only for 5 = 0. And, for any given S there exist 
compactly supported wavelets (p and ^p that satisfy condition (S) [e.g., 
Daubechies' wavelets, see Daubechies (1992), Chapter 6, or Hardle et al. 
(1998)]. 

Given Xi, . . . ,Xn i.i.d. with common absolutely continuous law P on M, 
the linear wavelet density estimator has the form 

1 " 

(11) 

= J2»k<P{y-k)+Y.Y./^iki^ik{y), yGK, 

k 1=0 k 

where K is as in (5), jn G N satisfies j„ X cxo as n — > oo, and where oik = 
J <p{x — k)dPn{x), Pik = J ipik{x) dPn{x) are the empirical wavelet coeffi- 
cients. We note that for ip compactly supported, there are only finitely 
many fes for which these coefficients are nonzero (with the set of coeffi- 
cients depending on y). Note that, \i (p = l(o,i]; then p„ is just the usual 
histogram density estimator (with dyadic binpoints). For general compactly 
supported wavelets (p^'ip-, the estimator p„ was first studied by Doukhan and 
Leon (1990) and Kerkyacharian and Picard (1992). 

2.3. Besov spaces. To deal with the approximation error ("bias term") 
of wavelet density estimators, and for some proofs, we introduce the Besov 
spaces -Bpg(R), which form a general scale of smooth function spaces (that 
contain all the classical ones as special cases). Besov spaces can be de- 
fined in several equivalent ways, the classical one being in terms of C^-C^ 
norms of the second differences \h\-'"i-'^ x {D'-^'^fi- + h) + L>^-'t^>/(- - 
h) — 2Z)^~'t*^/(-)) of weak derivatives of /, where < {s} < 1 and s — {s} € 
N U {0}. Wavelet bases provide another characterization of these spaces, 
hence it is most convenient for our purposes to define them in this way. 

Definition 1. Let I < p,q < oo, < s < S, s G M, 5 G N. Let cp be 
a bounded, compactly supported father wavelet that satisfies part (i) of 
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Condition 1(S), and denote by ak{f) and (3ik{f) the wavelet coefficients of 
/ S £P(M). The Besov space i?pg(M) is defined as the set of functions 

fecm--\\f\\s,p,q--=\Hm\p 

with the obvious modification in case q = oo. 

Remark 1 (Properties of Besov spaces). That this definition coincides 
with the more classical ones follows, for instance, from Meyer (1992, page 
200) or Section 9 in Hardle et al. (1998). The definition is independent of 
the choice of 0, V'l and one has the continuous imbedding of i?pg(M) [defined 
via (j) satisfying part (i) of Condition 1(R) with < r < R] into Bpq{R) 
[defined via a possibly different (p' satisfying part (i) of Condition 1(S) with 
< s < 5 for r > s]. We also recall some well-known relations of Bpg{M.) 
to classical smooth function spaces [see, e.g., Triebel (1983)]: for example, 
-Bpg(]R) is continuously imbedded into vC^'(M) for I <p < oo, and, if C'^(M) are 
the classical Holder spaces (of s-times continuously differentiable functions 
in case s G N), then 

(12) B^^,{R)^C{R)^B^^^{R) 

holds, where the second imbedding is even an identity if s is noninteger; and 
one also has the Sobolev type imbedding Bpg{R) ^ C''~^/*'(M) for s > 1/p 
or s = 1/p and q = 1- Further examples are the classical Sobolev spaces 
H'{R) = {/ G C'^iR) : |F/(-)|^(l + | • |^)^ G C^iR)}, where F is the Fourier 
transform, for which one has H^{R) = S|2(M); and ff BV{R) = {/ : vi{f) < 
oo}, where vi is defined in (13) below, then Bl^{R) ^ BV{R) n C^{R) ^ 

3. Entropy and expectation bounds. In this section we will show that 
certain classes of functions related to the kernel K{y,x) = J2kGZ ^iv — k)(f){x — 
k) are VC-type classes of functions, meaning that they have C^iQ) cover- 
ing numbers of polynomial order, uniformly in all probability measures Q. 
Using expectation inequalities for VC-classes, we obtain — as an immediate 
consequence — a finite sample inequality for the expected value of the devi- 
ation of the wavelet estimator from its mean. Also, these VC-bounds will 
be applied in later sections to obtain various exponential inequalities for 
wavelet density estimators. 
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A function h is of bounded p- variation on R, < p < cx), if 

Vp{h) ■.= SUp\J2\fi^^)-fi^^-lW■■ 

(13) 

n G N, — oo < xo < xi < • • • < x„ < oo > 

is finite. The following lemma — which uses (and generalizes) a result due to 
Nolan and Pollard (1987) — will be useful in what follows. As usual, for H. a 
class of functions in C^{Q), 1 < r < od, N(Ti.,C'{Q),e) denotes the minimal 
number of {Q)-ha[ls of radius less than or equal to e, that cover 7i. The 
logarithm of the covering number is the {Q)-metnc entropy of 7i. 

Lemma 1. Let h:M.^M be a function of bounded p-variation, p>l. 
Set 

n = {h{{-)t- s):t,seR}. 

Then 7i satisfies 

svLpN{n,C\Q),e)<(-] , 0<e<A, 
Q V e / 

with finite positive constants A,v depending only on h, and where the supre- 
mum extends over all Borel probability measures Q onM.. 



Proof. It is known that h is equal to g o f where / is nondecreasing 
with range contained in [0,Wp(/i)] and g is l/p-H61der-continuous on the full 
intervall [0,Vp{h)] [see Love and Young (1937) and also Dudley (1992), page 
1971]. The set Ai of dilations and translations of / satisfies the required 
entropy bound with C'^{Q) replaced by C{Q) for any r > (where || • ||r,Q = 
/ 1 • Y dQ if r < 1), with a constant A that depends only on r times vi{f) 
[see Nolan and Pollard (1987) and de la Peha and Gine (1999), page 224, 
for r < 1]. Since 

j \g{mi) - g{m2)\'' dQ < J [^m - msl^/f dQ, 

it follows that any e-covering ofM for induces a e*-covering of Ti 

of the same cardinality, for s = 1/p if 2/p> 1 and s = 1/2 otherwise, proving 
the lemma (for suitable v depending only on p). □ 

We will impose the following condition on the function defining the 
kernel K in (5). 
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Condition 2. : M — > M is of bounded p- variation for some 1 < p < oo 
and vanishes on {Bi, B-if for some — oo < < B2 < co. 

The Haar father wavelet = l(o,i] is of bounded variation {p = 1) and 
hence satisfies Condition 2. Furthermore, since any a-Holder-continuous 
function (0 < a < 1) with compact support is also of bounded 1/a-variation, 
Condition 2 is also satisfied, for example, for all Daubechies' (father) wavelets 
[see, e.g., Hardle et al. (1998), Remark 7.1]. It should be noted that not all 
Daubechies' wavelets are of bounded variation, but they are all Holder con- 
tinuous for some < a < 1, which is why the generalization to p- variation 
of the result of Nolan and Pollard (1987) is useful in the present context. 

Now for (j) satisfying Condition 2, define 

(14) -^.^ = |E -^(^'^ - ^)'^(2^(-) - A;) :y G M, j G N U {0}| 
and 

(15) V^j = \Y,2^ f (l){2^y-k)dy(^{2\-)-k):t(^A, jGNU{0}. 

Notice that by (6), both classes have a constant envelope. 

Lemma 2. Let Q he either or T>^^j, where (p satisfies Condition 2. 
Then we have the uniform entropy bound 

(16) supN{g,C\Q),e)<(-) , 0<e<A 

Q V e / 

for A, V positive and finite constants depending only on (p ( and not on j for 
T^ffij), and where the supremum extends over all Borel probability measures 
Q on R. 

Proof. The case of T^: for y,j fixed, the sum J2k&z4'{'^^y - k)4>{'^^ {■) - 
k) consists of at most [B2 — Bi] + 1 summands, each of which has the form 

cj){2^y - k)ct>{2\-) -k) = c,,y,k<P{2\-) - k), 

where k \s a, fixed integer satisfying 2^y — B2 < k < 2^y — Bi, and where 
|cj,y,fc| < ||(/>||oo- Since 4> is of bounded p- variation, Lemma 1 above applies to 
the class Ai of dilations and translations of (j), yielding the entropy bound 
(16) for Ai (with different constants A,v). The class .T-^ consists of linear 
combinations of at most [B2 — -Bi] + 1 elements of whose coefficients 
are bounded in absolute value by ||(/>||oo- For given e' > 0, take an e'-dense 
subset {a/} of [— ||(/'||oo5 ll'/'lloo] and an L^(Q)-e'-dense subset {mi{-)} of A4. 
Then ^^'"""^ o/fe^ij^. (•)}; j are the centers of a covering of J^^ by L2{Q) 
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balls of radius e = {[B2 — Bi] + l)(||(^||oo + l)e', and a simple computation 
on covering numbers shows that the required entropy bound holds for T^j,. 
The case of T>^ j: by the support assumption on (/>, we have for every fixed 



Y,2^ f 4>{2'y-k)dy<P{2^{-)-k) 



k<2H-B2 2H-B2<k<2H-Bi 

where c = J^^4>iy) dy and \cj^t,k\ 1^ \\4'\\i- The class of functions 

^2H-B2<k<2H-Bi ^ 

satisfies the bound (16) with A,v independent of j, by the argument in the 
first part of the proof. Each function in the class 



k<2H-B2 

is the difference of two functions, one in each of the classes 

jc 0+(2^(-)-^):iG4, |c 0-(2^(-)-^):tG 

^ k<2H-B2 k<2H-B2 

where = — (p- and > 0. But these classes are linearly ordered, 

so their subgraphs are ordered by inclusion, and therefore are VC-subgraph 
of index 1 [cf. Dudley (1999), Theorem 4.2.6.]. The entropy bound for D^j 
follows from these observations and, again, a simple computation on covering 
numbers. □ 



Using expectation bounds for VC-classes of functions [e.g., Einmahl and 
Mason (2000), Gine and Guillou (2001)], the last lemma already implies the 
following result. 



Proposition 1. Let K{y,x) = Y,k4>{y — k)(/){x — k) where 4> satisfies 
Condition 2. Suppose that P has a hounded density pQ. Let < L < 00. 
Then, for every n G N and every fixed integer I > 0, V = max(/, 1), there 
exists a fixed constant c independent of n,l such that 

(17) sup Esnp\{Pn - P)Ki{y,-)\<c{^2H'/n+{2H'/n)). 

Po ■ \\po\\oo<L 3/eK 
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If, in addition, (j) is a father and ip an associated mother wavelet satisfying 
Condition 1(0), then, setting at = J 0(x — k)dPn{x), j3ik = J ipik{x) dPn{x) , 
ak = J 4>{x — k) dP{x) and (3ik = J tpik{x) dP{x), we have 

sup E sup I afc — fflfc I < C/ 

PO : l|Po||oo<i feGZ 

(18) 

sup Esup|Afc-Afc| <C(y7Vn + 2'/2/7n) 

PO : lbol|oo<i fcGZ 

for all l>0 and a constant C independent ofl,n. 

Proof. We apply Lemma 2 and the expectation inequality (57) below 
to the class 

T = {K{2'y, 2'(-)) - P{K{2'y, 2'(-))) : y G M}, 

which satisfies the same entropy bound as J-^, and which has constant en- 
velope U independent of [using (6)]. To bound second moments, we use 
(6) to the effect that 

sup / K^{2^y,2^x)po{x)dx = sup2-^ [ K'^{2^y,2^y + u)po{y + 2-^u) du 
y J y J 

< 2-' I Ipo I loo sup \\K{y,y + < 2-^L\\m = a\ 

The first claim of the proposition now follows from (57) [and the mea- 
surability Remark 2 below, which implies that the supremum over y in 
(17) is in fact countable]. For the second claim, set (3ik = Pik — Pik, define 
K^) '■= J2r Piri'lrix) = (Pn " P){Ki^i - Ki){x), and note that 

J hlpik = JYI i^lri^lri'lk = Plk 

(since the sum has finitely many terms and the ipir are orthogonal). Conse- 
quently, 

sup \Pik\ < \\h\\oo sup 

2^/2 / \Tp(2^x-k)\dx 
k k J 

<2-'/2||V.||i||(P„-P)(i^,+l-i^0lloo, 

which gives the bound (18) for f3ik — Pik by the first part. In the case of Uk — 
ak we have by a similar argument that sup;, |afe — Ofcl < ||(-Pn — -P)-f^o||oo||</'||i, 
and the result follows from the case / = in the first part of the proposition. 
□ 



Inequalities analogous to (17) and (18) hold as well if the first moment is 
replaced by pth moments. This can proved either directly, or by combining 
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the above moment bounds with Proposition 3.1 in Gine, Latala and Zinn 
(2000). 

If pq has compact support, then the number of nonzero wavelet coeffi- 
cients Pik, f3ik{po) at level / is finite and of the order 2'. In this case, the 
above proposition fohows from Bernstein's inequality combined with a sim- 
ple convexity argument that elaborates on one due to Pisier [cf. van der Vaart 
and Wellner (1996), Lemma 2.2.10]. However, if po does not have compact 
support, empirical process methods seem to be unavoidable to prove (17) 
and (18). 

Using the second part of Lemma 2, one can obtain a similar expectation 
bound for the distribution function F^(s) = Jl(X)Pn{y) dy of the wavelet 
density estimator, as we do after Lemma 4 below. 

4. Rates of almost sure uniform consistency for the wavelet density es- 
timator. We will now derive best possible almost sure rates of conver- 
gence for the deviation of the estimator Pniv) = PniKj„{y, •)) from its mean 
EPniu) = P{Kj„{y,-)) = Kj^{pQ){y) uniformly in y E M. We also obtain a 
uniform law of the logarithm for a suitably scaled version of pn — Epn . The 
results from this section are compared to similar results for the classical 
convolution kernel estimator in Remark 6 below. 

For K{y, x) = J2k 4'{y ~ k)(j){x — k) as in (5), define the function 

(19) K{y,x)- - 



Using (6) and (7), it is easy to see that, if <j) is bounded and compactly 
supported then there exist finite non-zero constants Di , D2 independent of 
y such that 

(20) Dl< J K'^{y,u)du<Dl 

We now proceed to prove the first main result, which is not the most exact, 
but requires minimal hypotheses. Let jn 00 be a sequence of nonnegative 
integers satisfying the following conditions: 

(21) -— > 00, > 00, sup (j2n - jn) < T 

Jn2J" log log n n>no 

for some r > 1 and some no < 00. 

Theorem 1. Let cf) he a father wavelet satisfying Condition 2. Suppose 
that P has a hounded density po and that jn satisfies (21). Then we have 

Pn{y)-Epn{y) 



(22) limsup./-^sup 



C a.s.. 
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where C < A/ 2'^||po||oo for a constant M that depends only on Di,D2, 
[cf. (6)] and on the VC- characteristics A and v of J-(f,. 



Proof. Let nk = 2^ . We have 
Pr< 



1 



sup < 



max _ . 



=1 



(23) < Pr^ 



max 

"fc-i<"-<"fc 



sup 



^(K(2^"y, - EK{2^"y, 2^-X)) 

n 

Y,{Ki2^y,2^X,) 



> s 



i=l 



-EK{2^y,2^X)) 



2^"k 



where j G N. To estimate the last probabiHty, we will apply Talagrand's 
inequality to the classes of functions 

= {K{2^y, 2^-)) - PiK{2^y, 2^-))) : y G M,i„,_, < j„ < 

which have constant envelope 2||<l>||oo/-Di [by (6) and (20)] and satisfy the 
same entropy bound as J-'(p in Lemma 2, with a possibly different A,v — but 
independent of k — by that lemma and a simple computation on covering 
numbers (since {J K"^ {2^ y , u) du)~^ € [l/D^jl/Df] for all y G M). Conse- 
quently we may apply inequality (60) below with U := Uk = 2||$||oo/-C'i, 
and (7^ = 2~^"k-i 

oo, where the bound on cr follows from 

_ jK\2^y,u)po{2-^u)du 
y jK^{2Jy,u)du 

<2"^'||po||oo. 

To be precise, we also need that the supremum in (23) is countable, and we 
show in Remark 2 below that this is the case. Setting s = \/2^^^^Ci||po||oo^ 
makes t ■ 



(24) 



sup / K {2^y,2^x)po{x)dx = suTp2 
y 



sy nfe_i2 -^"''juk admissible choice in (60) for all k large enough 
by the first and third conditions in (21). As a consequence, for these values 
of k, the probability in question is bounded from above by 

1 s'^nk-i2~^"''jnk 



i?exp 



< iiexp 



2Cij 



C3nfc2 ^"'=-i||Polioo 

Now the second limit in (21) becomes jn^/logk oo, hence the last expres- 
sion is the general term of a convergent series. Thus, modulo measurability, 
we have proved that (for the stipulated s). 




max sup 

nfe_i<n<nfc 




^(i^(2-'"y, 2^"Xi)-EK{2^"y, 2^"^)) 



> s > < oo. 
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which gives the theorem by Borel-Cantehi and the 0-1 law. □ 

Remark 2 (Measurabihty). In order to apply Talagrand's inequality in 
the previous theorem, we must show that the supremum in (22) is in fact 
a countable supremum. Let Ti be the set of discontinuities of <j), which is 
countable since, <j) being of bounded p-variation, it is the composition of a 
Holder-continuous with a nondecreasing function (see the proof of Lemma 1). 
Let To be a countable dense subset of M \ Ti and define T = {2^^{z + k) -.k G 
Z, z G To U Ti}. For each y, let cpy = {(j){2^y -k):k£Z)£ We first 

prove that 

(25) {(l)y;yeR}C{(^y;yeT}, 

where the closure is in ^°°(Z). Given y € M, two cases are possible. Either 
2^y — A; is a discontinuity point of (j) for some k gZ, or 2^y — A; G Tf for all 
k. In the first case, y £T. In the second case, (j)y can be approximated by 
4'ys ■• Vs as follows. Let k^ be the largest integer such that 2^y — kQ> B2, 
and A^AT = A;o + be the smallest integer such that 2^y — kj\f < Bi, and set 
ki = ko + i, i = 0,...,N. Note that iV < - + 1. Let < (^0 < 1 be such 
that (j) is continuous on the neighborhood Ni{6o) of 2^y — ki of radius Jq, 
2 = 0, . . . , A^. For <5 < 5o let z G Nq{6) n Tq and define ys = 2'^ {z + ko) G T. 
Then \2^y — ki — {2^ys — ki)\ < 5, i = 0, ■ ■ ■ , N . Hence, by continuity of cp at 
2^y — ki, we have maxo<j<Ar \4>{2^y — h) — (j){2^ys — A:i)| — > as (5 — > 0. Since, 
moreover, 4'{2^y — k) = 4>{2^ys — k) = ii k ^ {ko, . . . , k^}, we have (pyg — > (j)y 
in ^°°(Z) as 5 ^ concluding the proof of (25). Now 

j2iK{2'y,'^'X,) - EK{2^y,2^X)) = ^kH^^V-k^ ^ ^^^^^ 
i=i VEfe</'2(2^y-fc) 

where are random variables satisfying \ ck\ < c < 00 for c nonrandom 
by (6), and where Efc'A^l^^y - k) > Df > hy (20). Hence if (py^ cpy in 
then T{ys) -^T{y) (as 5^0). This, together with (25) proves that 
supj^g]R |r(y)| = supygy |r(?/)|- That is, the supremum in (22) is countable. 

Remark 3. The proof of Theorem 1 also shows that, under the condi- 
tions of this theorem, 

I Tl 

(26) limsup J— -r-sup|p„(y) - ^p„(y)| = C a.s., 

n^oo V Jn2^" yeK. 

where < Af2'^||<I>||2||pol|oo- [The only difference is that in this case we use 
the variance estimate cr^ = 2~-'||po||oo||*l'||2i which follows as in (24).] 

The following corollary to the proof of Theorem 1 will be needed for the 
more exact result below. 
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Corollary 1. Let D cM. be such that ||po||d > 0. //, in addition to the 
hypotheses in Theorem 1, po is uniformly continuous, then 

Pn{y) - Epn{y) 



lim sup . / — sup 



C a.s.. 



where < M'^2'^\\pq\\d and where M is as in Theorem 1. 

Proof. The proof is, as in Theorem 1, after observing that for every 
e > and k large enough, the bound in (24), for y ^ D, becomes cj^ = 
(l+e)2~-'"fc-i IIpoIId, by uniform continuity ofpo and since, for all x, K{x,x-\- 
u) = f)ii\u\> B2- Bi. □ 

To obtain the exact constant in the almost sure limit, we now proceed to 
give a lower bound. 

Proposition 2. Let cf) he a hounded father wavelet vanishing on {Bi,B<2f, 
— 00 < -Bi < B2 < 00, and assume that P has a hounded continuous density 
Pq. Then, i/ j^/ loglogn ^ 00, we have 



lim inf , / — — sup 

— 'I (21og2)j„2J- ^ 



n— >oo 



Pniy) - Epn{y) 



> IIPoll^^ 



a.s. 



Proof. By Proposition 2 in Einmahl and Mason (2000), the conclusion 
holds if, for every e > and n large enough (depending on e), we can find 
kn = fcn(e) points Zin = Zin{e), i = 1, . . . ,kn, such that, if 

gt\x)=gl''''\x)=K{2^-z,n,2'-x), 

then the following conditions hold (for all n large enough and for constants 
r, /ij, fjj, i = 1,2, depending on e): 

(a) Pr{gf )(X) / 0, <7t"^(X) / 0} = 0, ^ / k, 

(b) Ef=iPr{5f^(^)/0}<l/2, 

(c) 2-J-A;„^re (0,00), 

(d) 2->>i < E{gf'\x)) < 2->>2 for some -00 < ^ui < ^2 < 00, 

(e) CTi2-J"/2 < y^Var(5f ^)(X) < cr22~^"/^ for some < ai < (T2 < 00, 

(f) supj^„ ||5'i"''l|oo < 00, 

(g) lim£^o'7i(e) =lime^oo-2(e) = llPolloo- 
We proceed to verify these conditions. Given e > 0, let / be an interval 

such that po{x) > (1 — e)||po||oo for all x e I; and such that Pr{X e /} < 1/2. 
Such an interval exists because po is bounded and continuous. Set / = [a, b] 
and define 

Zin = a + 3{B2-Bi)i2-^", 
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where 



1,2, 



(6-a)2J- 



2>{B2-Bi) 



1 : — kn- 



For (a) note that K{2^"^Zin,2^"x) / imphes \x-Zin\2^" < B2-B1, and that 
\zin — Zkn\ > 2~^"3\B2 — by construction, which together imply that the 
set in question is empty. For (b) note that by (a) the sum of the probabiUties 
in (b) is Pr(Uf=i{ffJ"-'(X) / 0}) < Fr{X £ 1} < 1/2. By construction, the 
hmit in (c) is 3(3^^3^) • Condition (f) fohows immediately from (6) and 

the assumption on (p. Conditions (d) and (e) are implied by the following 
estimates. First, 

\K{2^-Zin,2^-x)\po{x) dx 
<L>fi / \K{2^-z^n,2^"x)\poix)dx 



(27) 

< 2-^-D^^ J \K{2^"z,n,2^''Zin + u)\po{zin + u2-^-)du 

<2-^-Di%o\umi, 

where we use (20) in the first inequality and (6) in the last, and 

J K\2^"Zin,2^-x)po{x)dx < 2-J"||po||oo 

by (24), which give the upper bounds in (d) and (e) with fi2 = ^r"'^||po||oo||*^*||i 
and o"2 = IIpoIIoo- Second, for the lower bound in (d), again using (6), (7) and 
(20), 

J K{2^"Zin,2^"x)po{x)dx 

>D2^ j K{2^-Zin,2^-x)\\po\\oodx 

-D^' J \K{2^" z,n,2^"x)\\\\po\\oo-Poix)\ dx 

>2-^-D^^\\po\\^{l-e\mi), 

which gives = ""^11^01100(1 — e||<I>||i) in (d). Third, for the lower bound 
in (e), note that the inequalities (27) give {E{gl'^\x))f = 0(2"2J"), whereas 

E(5f ^ {X)f = 2"^" (^f ) {2^-Zin, 2^-Zin + u)fpo{z,n + u2~^-)du 

>2-^"||po||oo(l-e), 
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since Zj„ + u2~^" G / and by construction of /. So the lower bound in con- 
dition (e) is satisfied with erf = ||po||oo(l — 2e) for all n large enough, which, 
together with o"! = ||pol|oo) gives condition (g). □ 



This proposition. Theorem 1 and the bounds (20) already determine the 
a.s. rate of convergence of \\pn — Epn\\oo, 




\pn - EpnWoo = Oa.s.(l) and not Oa.s.(l) 



and the same is true for the normalized quantity in Theorem 1. To obtain 
the exact limit (with normalization), we need the following proposition. 

In the next proposition, the (at first sight somewhat awkward) condition 
Bi, B2 £ is designed to include both the Haar wavelet and any continuous 
father wavelet with bounded support and bounded p-variation. 



Proposition 3. Let (p be a father wavelet that satisfies Condition 2 
and is uniformly continuous on {Bi,B2], where Bi,B2 € Suppose P has 
a bounded uniformly continuous density pQ. Let D be a bounded subset ofM. 
Then, if jn satisfies (21), we have 



lim sup , / ^ — sup 

^'^Y (21og2)i„2> j^eS 



Pn{y)-Epn{y) 



Efc02(2Jny-fc) 



< IIpoII^' a.s. 



Proof. We choose AG (1,2) and n'^ = [A'^] (where [a] denotes the integer 
part of a). Since [A''] < 2[A''"^] (as [A^]/[A''~^] ^ A < 2) for /c large enough, 
it follows that for such k, the cardinality of the set {2~-^" : n^_^ < n < n^,} 
does not exceed 2 if r in (21) equals 1, which we assume, because the proof 
for larger r requires only formal changes to the present proof. Define n^-i = 
f^'k-i if this cardinality is 1, and otherwise let nfe_i be the largest integer n 
such that jn = jn' ■ Then we have 



(28) [A'^-i] = < Uk-i < n', <nk< n',_,, = [X''+'] 
and 

(29) jn = jn'^ = jnk for Hk-l <n<nj,. 



Let 5ni = 1/m for m E N. For each given k and 6m, we consider the following 
partition of L*. D is contained in the union of at most 2-|-diam(L')/(2~-'"fe {B2 - 
Bi)) disjoint sets (2"^"^ {Bi + /), 2"^"*= {B2 + 1)], l£Z. Then divide each of 
these intervals into m(i?2 — Bi) disjoint left-open right-closed subintervals 
/fc i of length (5m2~-'"fc and let z^i be the right endpoints of the interval % 
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[i.e., Zki = {Bi + m'Sra + l)2~^"k for some 1 <m' <m and some / G Z]. Then 
the number of intervals /^j covering D satisfies 

(30) l,<2 + ^^< 



for some c finite (and k large enough depending on m). These intervals Iki 
also have the following property: 

If z G Iki and / G Z, 

(31) 

then 2J"fe Zfci - / G (5i , B2] ^ 2^"^ z - / G (Si , S2] , 

and, for each z^j, this happens for B2 — Bi integers I. As in (24) we have 

EK\2-^-^Zk^,2-i"^X) < 2->fc IIpoIIoo, 

hence the maximal version of Bernstein's inequality [see Einmahl and Mason 
(1996)] gives that, for ah r/ > 0, 



Pr< max max 



Y,{K{2-^-^ZH,2-^">^Xr) - EK{2-^"kZH,2-^-^X)) 



r=l 



> ^2(1 + v)nk2-^-^ IIpo lloo log 2>^ | 
< 2Zfcexp|-(2(l + r/)nfc2->'fe ||po||oo log2>'0 
X (^2nfc2-^"fc||po||oo 

+ ^L>ri||$||oo\/2(l + r/)nfc2-J"H|po||oolog2^"fc^ |, 

which, by the first condition in (21), is dominated by 

24exp{-ii±^^ll^} < cm{2-^^^^)i'^^yi^+^^^-^ 

for some 77^ — > and c as in (30). This is the general term of a convergent 
series since — 1 > and by the second condition in (21). Then Borel- 
Cantelli shows that 

/ n 

^(K(2-^"ZH,2-^"X,,) - EKi2-^"Zki,2-^"X)) 



r=l 



limsup max max 

n l<i<lknk-i<n<nk' 

(32) 

X (y'2n2-i"||Po||oolog2i«) ^ < A 
almost surely, where we use (28) and (29). 
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Now consider the class of functions 

Gki = {if (2^"fc 2^"fe (•)) - if (2^"fc z, 2^"fe (•)) : z G 4. n D} 

for /cGN, 1 < i < Ik- These classes are of VC-type — with the same VC- 
characteristics for each k and i — by Lemma 2 and the permanence proper- 
ties of VC-classes. We apply Talagrand's inequality to them, and we must 
estimate the variance a of the functions in these classes: by (29), (31) and 
change of variables, for n^^i <n< and z £ Iki we have 

E{K{2^"kZki,2^^>'X) - K{2^"z,2^"X)f 

(33) 

< \\po\\ooiB2-B,f(^l cp\x)dxyij^)2-^-\ 

where uJs{(p) denotes the J- modulus of continuity of (j) on (i?i, i?2]- The same 
computation gives 



{\\K{2^-kZki,-)\\2-\\Ki2^-z,-)\\2y< J {K{2^"^Zk^,u)-K{2^"kz,u)ydu 

<iB2-B,f(^l cl>\x)dxyij^). 

Using the last two estimates, (20), (24) and that \a/a — b/ P\ < a~^\a — fe| + 
|6|(a/3)~^|/3 — Q;| we obtain 

E{K{2^"^Zki,2^-^X) - Ki2^-z,2^"X)f < C70||po||oo(i?2 - Bi)WsJ(t>)2-^"^ 
for Uk-i <n<nk and z£ hi- We set al = C^\\po\\oo{B2- Bifuj^^{(j))2"^"k ■= 
C^ujj^{(l))2~^"k ,andU= Uk = 4i)^^ ||$||oo- By the first condition in (21) we 
have 



log(C//afc) 
and Talagrand's inequality (60) gives 



Pri max max ||n(P„-P)||g > W3C3nfc<T^log— i < i2/fcexp(-31og — 1. 

[l<i<lk"'k-l<n<nk y CTfc J L CfcJ 

(Note that the supremum over Qki is a countable supremum by Remark 2.) 
Now, for a fixed constant L' , 



/fcexp|-31og— I < cm2^"k (^^^ < L'mLol^{4>)2-^"kJ'^ , 

which, by the second limit in (21) and by (28), is the general term of a 
convergent series. Then, for nk-i <n<nk, and /c,m large enough, one has 



rikai log — < 2X'C'n2-^" log{2^")u;t i<P) log 
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It then follows by Borel-Cantelli that 

\HPn-P)\\g,, 
hmsup max max — . 

n rife _i<n<nfe •v/n2~J" log 2-'" 

(34) 

U 



almost surely, for all A and for all m large enough. Now combining (32) and 
(34) we have for these m 

||E"=i(/^(2-^ "(-),2-J"X,)-j^K(2-^"(-),2-^"X))|U 

iiixi sup 



2?7,2~J"||po||oolog 2-?" 
U 



and the result follows by letting A ^ 1 and m — > co. □ 

Summarizing, we have basically proved the main theorem of this section: 



Theorem 2. If cj), P and {jn} are as in Proposition 3, then 

Pn{y) - Epn{y) 



lim 



n 



sup 



n^ooV/ 2(log2)j„2J" ygffi 



Efc02(2J'.y-/c) 



WPo 



1 1/2 



a.s. 



Proof. Define = {x G '&:pq{x) > e, \x\ < 1/e} for e > 0. Applying 
Proposition 3 to and Corollary 1 to D^, we obtain 



lim sup , 

n— >oo 



n 



sup 



2(log2)i„2>^ 



Pn{y) - Epniy) 



<||po||^' + M2-/2||p, 



for all e > 0. Now, since, ||po||d'= —^0 as e — > by uniform continuity of 

n i|l/2 

Po, this lim sup does not exceed ||po||oo • This and Proposition 2 prove the 
theorem. □ 



With the natural changes in the variance computations (24) and (33), the 
proof of Theorem 2 implies a result similar to the one in Massiani (2003), 
which is the counterpart for the wavelet density estimator of the classical 
result of Stute (1982) for the Parzen-Rosenblatt estimator. 

Corollary 2. Let (j) and the SGQU6TIC6 jfi be CLS 111 PvopositioTi 3. Let D 
he a hounded subset of M and assume that po is uniformly continuous on a 
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neighborhood of D and UpoIIdt^O. Then 

Pn{y)-Epn{y) 



lim 



n 



^^^^^ ■ sup 

n~>oo Y 2(log2)i„2J'^ y'^fc,/.2(2i"y - k) 

If furthermore \nix^£)PQ{x) > 0, then 

Pn{y) -Epn{y) 



II llV2 



a.s. 



lim 



n 



2(log2)j„2i"^6E 



a.s. 



Remark 4 (Moments and Laplace transforms). We note that the a.s. 
limits in the previous theorem can be complemented by convergence of mo- 
ments. In fact, direct application of inequality (61) gives that under the 
conditions of Theorem 1, 



(35) 



sup-Bexp< A, 



n 



sup 



2(log2)j„2i" j^eS 



Pn{y) -Epn{y) 



Ek(t>H2^"y-k) 



< oo 



for all A > 0, and the same is true without the normalization by yJ2k </'^(2-'"y - 
This yields enough uniform integrability to obtain that under the conditions 
of Theorem 2, 



k). 



(36) lim £'exp< A, 



n 



sup 



2(log2)j„2i" yS 



Pn{y) - Epn{y) 



Ek^PH'^-y-k) 



P->*l|PO 



,1/2 



for all A > 0. In particular we obtain convergence of all moments in Theorem 
2 (as well as uniform boundedness of all moments in Theorem 1 and in 
Remark 3). 

Remark 5 (Haar wavelet and normalization). Theorem 2 (and Corol- 



lary 2) applies to the Haar father wavelet 



1 



(0,1] 



(which is obviously uni- 



formly continuous on (i?i, S2] = (0, 1]). In this case, Y,k (^■'"y — A;) = 1 for 
all j,y. However, if is not constant on {Bi,B2\, the quantity Yl,k4>'^{'^''"y ~ 
k) = J K'^{2^"y,u) du — although bounded from above and below — depends 
on y and jn, which is why it must be part of the normalization instead of 
the limit. 



Remark 6 (Comparison to convolution kernels). The resolution level 
jn in wavelet density estimation, more exactly, the quantity 2~-^", corre- 
sponds to the window width /i„ in the classical ( "Parzen-Rosenblatt" ) con- 
volution kernel density estimator. If K{y,x) = K(y — x) then the variance 
of the estimator Pn{y) = h~^n~^ Ya^i K{{y — Xi)/hn) is asymptotically of 
the order n~^h~^pQ{y)\\K\\'2, whereas the order of the variance of Pn{y) is 
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n~^2^"PQ{y) J K'^{2^"y,x) dx, which is different (except for Haar wavelets) 
since the £^(M) norm of K{y,x) oscihates with y. Modulo these differences, 
the a.s. asymptotic behavior of wavelet estimators is similar to that of con- 
volution kernel estimators [cf. Stute (1982), Deheuvels (2000) and Gine and 
Guillou (2002)]. Regarding proofs, generally, the derivation of Theorems 1 
and 2 follows the same pattern of proof of Theorems 2.3 and 3.3 in Gine and 
Guillou (2002); the short proofs of their Theorem 2.3 and of our Theorem 1 
consist of a direct application of Talagrand's inequality, moment bounds for 
VC-type classes of functions and "blocking," and here the differences only 
reside in the fact that the classes of functions associated with the kernels K 
are not the same (but in both cases of VC type), in different variance com- 
putations, and in measurability considerations. However, the proof of the 
exact limit law (Theorem 2) is more delicate in the wavelet case. In Proposi- 
tion 3 above we cannot use continuity of translations and dilations in £^(]R) 
as in the upper bound part of Proposition 3.1 in Gine and Guillou (2002). 
Similarly, the conditions (a)-(g) that we verify in the proof of Proposition 2 
also require different methods than those in Einmahl and Mason (2000) or 
Gine and Guillou (2002). 

Remark 7 [Nonorthogonal (p{- — A;)'s] . The estimator Pn{y) = Pn{K{2^"y, 
•)) for K{y,x) = J2k4'{y ~ k)(j){x — k) makes sense even if (p is not a fa- 
ther wavelet, that is, the (/>(• — k) need not form an orthogonal system. 
Assuming (p satisfies Condition 2 and inf ||i('(y, •)||2 > 0, then the results 
proved so far in this section still hold true both for — Epn\\oo and for 
supj, \pn{y)-Epn{y)\/\\K{2^-y,-)h. 

4.1. Approximation error and optimal rates of convergence. The previ- 
ous results were formulated for the deviation of pn from Epn , whereas the 
quantity of statistical interest is pn — Po- The "bias" Epn — Po = Kj^po) — po 
is nonrandom and can be dealt with by using standard results on approxima- 
tion of functions by wavelets. If po is uniformly continuous then, by (6), (7) 
and Minkowski's inequality for integrals, ||-ft^j(po) — Polloo < / ^{u)\\po{2~'-u + 
■) — PoWoodu ^ for / ^ oo and (j) satisfying Condition 1(0), so that if also 
Condition 2 holds, then 

||Pn -Polloo = Oa.s.(l) 

by Remark 3 if one chooses 2^" ~ n/(logn)^^'^ for some 6 > 0. 

If more is known on the smoothness of po one can obtain rates of conver- 
gence. The approximation error in sup-norm loss of a function / by wavelets 
is related to containment of / in the Besov space B^^{M). Recall from (12) 
that these spaces include the more classical Holder spaces C*(M). A function 
Po in B^^(M.) is approximated by its projection Kj{pQ) in the uniform norm 
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at rate 2 if has some regularity. To be precise, if (j),ip satisfy Condition 
1(T) for < t < r + 1, and if po e Bl^^(R), then the bounds 

(37) \\K,{po)-po\\oo<C2-^' and sup |Afc(po)| < C2-'(*+i/2) 

k 

can be shown to hold [e.g., Hardle et al. (1998), Theorem 9.4]. Inspection 
of the proof of that theorem shows that the constant C depends on po only 
through its Besov norm ||polk,oo,oo- 

As a consequence we have the following theorem. 

Theorem 3. Assume that the density pq of P satisfies p^ G ^^^^^(R) 
for some t>0. Let pn be as in (11) where 4> satisfies Condition 2, and cj), ip 
are such that Condition 1{T) holds for some < T < oo. If jn satisfies (21), 
then 

sup \pn{x) - Po{x)\ = Oa.s. {\ ^ + 2"*^"^ 

and 

[Esup \pn{x) - po(x)r) = O (J^^ + 2-*^" j 

for every < t < T + 1 and for every 1 < p < oo. 

Proof. This follows from Remarks 3 and 4 and (37). □ 

We note that Conditions 1(T) and 2 are satisfied for a large variety of 
wavelets, for example, the Haar wavelet (T = 0), or for sufficiently regular 
Daubechies wavelets (arbitrary T > 0) [cf. Hardle et al. (1998), Remark 7.1]. 



Remark 8 (Optimal rates of convergence over general Besov bodies). 
The last theorem implies that the linear wavelet estimator with 2-'" ~ 
(ra/logn)^/^^*"*"^^ achieves the optimal [over S^^(M) -balls] rate of conver- 
gence ((logn)/n)*/'^^*+^) in the uniform norm for estimating po [see, e.g., 
Juditsky and Lambert-Lacroix (2004) for optimality of these rates]. One 
might ask whether the linear wavelet estimator pn is also best possible if pq 
is contained in some space other than B^^(R), for example, in a Besov space 
i?pg(M) with t>l/p. The continuous (Sobolev-type) imbedding of i?pg(R) 

into S^io/^(R) (see Remark 1) and the choice 2^" ~ (n/ logn)i/(2a-i/p)+i) 
then give (^||p„-po||So)^^'^ = 0((logn/n)-(*-Vp)/(2(t-i/p)+i))^ for all r, which 
is still the optimal rate of convergence [see, e.g., Donoho et al. (1996), The- 
orem 1]. 
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5. Uniform central limit theorems for wavelet density estimators. Con- 
sider again the wavelet estimator Pn{y) defined in (11). In this section we 
study the stochastic process 

f^^j {Pn{y) - Poiy))fiy) dy = V^{P^ - P)f, 

where / varies over some Donsker class of functions and where dP^{y) = 
Pn{y) dy. Note that {f) = Pn{Kj{f)). The classical case is ^ = {l(_oo,s] '■ s G 
M}, in which case one has 

- P)f = V^{F^{s) - F{s)), 

where and F are the distribution functions of pn and po, respectively. 
We will show that y/n{P^ — P) converges in distribution in £°°{J^) to Gp, 
for various Donsker classes Our proofs will in fact show \\P^ — Pn\\j^ = 
op{l/^yn). The limit theorem for P^ can then be inferred from a limit 
theorem for P„ (using the fact that .7^ is a Donsker class). In the classical 
case of {F^ ~ F), we will also obtain a Dvoretzky-Kiefer-Wolfowitz type 
inequality, the compact law of the iterated logarithm, as well as a strong 
invar iance principle. 

We set, for ease of notation, P{Kj){y) = PKj{y,-), and we will use the 
symbol P{Kj) both for the function and the finite signed measure that has 
it as density. The same applies to Pn{Kj). For /, a bounded function, the 
following decomposition will be useful: 

(38) (Pf - Pn)f = {Pn - P){K,{f) - /) + (P(K,) - P)f. 

The first term is stochastic, whereas the second ("expectation") term is 
deterministic, and we will deal with these two terms separately. 

5.1. CLT and strong invariance principles for the distribution function 
of the wavelet estimator. We will first treat the classical special case J- = 
{l{-oo,s] : s £ IR}, which corresponds to studying the distribution function 

F^{s)= r Pn{y)dy 

J — oo 

of the wavelet density estimator p„. The key result will be an exponential 
inequality for the random quantity — -^nlloo, where F„(s) = ^ dPn 

is the empirical distribution function. This inequality will follow from ap- 
plying Talagrand's inequality (and Lemma 2) to the centered term in the 
decomposition (38), but we first must show that the second ("expectation") 
term is sufficiently small for relevant choices of j. 

Lemma 3. Let K{y,x) be a projection kernel as in (5) arising from the 
father wavelet (p, and assume that (j), %[) satisfy Condition -Z(T) for some 
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< T < oo. Assume further that the density po is a bounded function — in 
which case we set t = — or that pQ £ B^^{M) for some t, < t < T + 1. 
Let T = {l(_oo,s] : s G IR}- Then the inequality 

sup|(P(K,)-P)/|<C2-^(*+^) 
holds for some constant C depending only on (j) and ||po||i,oo (with ||po||o,oo = 

IIpoIIooJ- 

Proof. Using that the wavelet series (9) of pq G converges in 

^CVK), we have 



Kj (po) -Po = - "^"^Pik {Po)4^ik, 
i=j k 

in the £^-sense. Therefore, since / = l{-oo,s] £ >C°°(M), 
{P{K,)-P)f = j{K,{p,)-po)f 



we have 



(39) 



1/(2;) fix 

\l=j k J 



Y^^f^ikiPo) / f{x)i^ik{x)dx 

l=j k 



= -EEAfc(po)Afc(/). 

l=j k 

Thus, we only need to obtain bounds for the wavelet coefficients Pik{po) and 
Pikif). 

We first obtain a bound for /. We observe that 

J {Ki+i - Ki){f)^ik = j Y.M)^lr^lk=Plk{f), 

where the first identity holds by pointwise equality of the integrands, and 
the second because the sum has only a finite number of nonzero terms (due 
to compactness of the support of ^ip) and since the ipikS are orthogonal. 
Therefore, we have, using also (6) with ij: instead of (p, 

||A(.)(/)lli< jY.\(Ki+,-Ki){f){^mik{x)\dx 



(40) 



Y.\^{2\-) - k)\ 



\Ki+i{f)-Ki{f)h 



<c2^l\\\Ki^,{f)-f\\, + \mf)-f\\,). 
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2'K{2'y,2'x)f{x)dx- f{y) 



dy 



= J J 2'K{2'y,2'u + 2'y){f{u + y)- f{y))du 
<J j2'^2'u)\f{u + y)-f{y)\dudy 
^{u) I \f(2-'u + y)- f{y)\dydu 



dy 



dy 



du 



< 2~^ j ^{u)\u\du. 
Since <I> is bounded and has compact support, we conclude that 



(41) 



sup||A(.)(/)lli<c'2 



for some c' S (0,cxd). For the wavelet coefficients of po, we have from (37) 
that ||A(.)(po)||oo < C2-'(*+i/2) for < t < T + 1 and some constant C. If 
i = 0, one has the same bound since 

|Afc(Po)|<2'/2 j \^(2'x-k)\po{x)dx<2-'l^\\ni\\p4oo 



by a simple change of variables. 

Applying these bounds to (39), we have 



sup 



{K,ipo)-po)f 



<SUp^||A(.)(po)|loollA(.)(/)lll 
l>j 



<c"2-J'(*+i), 



which completes the proof. □ 



Using Lemmas 2 and 3 one can prove the following inequality, which is 
similar to Theorem 1 in Gine and Nickl (2009). 

Lemma 4. Let F„(s) = Jl^dPn and (s) = Jl^Pn{y)dy, where p„ 
is as in (11), (p satisfies Condition 2, and (j), ip are such that Condition 
i(T) holds for some < T < oo. Assume further that the density po of P 
is a bounded function — in which case we set t = — or that po £ B^^{K) 
for some t, 0<t<T+l. Let j satisfy 2~^ > d(logn)/n for some < 
d < cxD. Then there exist finite positive constants L := L(||po||oo) d), Aq : = 
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M{\\Po\\t,oo,4',d) such that for a// n G N and A > Aq max( Vj2~^ y/n2~^^^'+^^) 
we have 



Pr(V^||Ff - F^lloo > A) < Lexp|- 



L 



Proof. Set = ,,] : s G M}. Using the decomposition (38) and 

Lemma 3 we have 

Pr(V^||Ff - > A) < Prfn sup|(P„ - - /)| > ^) 

by assumption on A (if we take Aq > 2C), and we wih apply Talagrand's 
inequahty to the class 

T = {Kj{f) - / - P{K,{f) - /) : / G 

which is a VC-type class by Lemma 2 (and since J- is VC) — to bound the last 
probability. Notice that the supremum over / G is in fact over a countable 
set by left continuity of the function s ^ Kj{l(^_^^gj) — l(_oo,s]- Using the 
fact that K is majorized by a convolution kernel $ [cf. (6)], one establishes 
by the same arguments as in the proof of Theorem 1 in Gine and Nickl 
(2009) that has constant envelope U = c||$||i and that 

sup||i^,(/)-/||2,P<c'2-^/'=:(7 

for some < c' < oo that depends only on ||po||oo and Therefore, we have 
a < U/2 and no"^ > C\og{U/a). Using (59) we can choose Aq large enough 
so that 



4-^V^Aoyi2-J >E 

in the notation of Appendix, which means that E+ ^/nX/4: < ^Jn\|1. These 
conditions and the obvious bound log(l + x) > ((e — l)~^x V 1) for x > 
applied to (56) give the result. □ 

Remark 9 (Asymptotic equivalence of and -F„,). Note that the 
variance estimate in the previous proof together with Lemma 3 [assuming 
(d log n)/n < 2"-'"], implies, using (57), 



^sup|Ff (0-F„(t)| <C 
which is o(1/y^) if j = j„ is such that y^2-'"(*+^) 0. 



J 
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The last remark suggests that — in the most interesting range of j„s such 
as 2"-'" ~ 7^-i/(2t+i) — ii^Q integrated wavelet density estimator is asymp- 
totically equivalent to the empirical distribution function (while, at the 
same time, delivering a reasonable estimate of the density). The exponential 
bound from Lemma 4 is the key to transferring several classical results for 
the empirical distribution function to the cdf of the wavelet density esti- 
mator, and we state below some of the more important results that can be 
obtained in this way. 

For example. Lemma 4 implies a Dvoretzky, Kiefer and Wolfowitz (1956) 
type exponential bound, up to constants, for the distribution function of the 
wavelet estimator; namely, there exist universal constants ci, C2 such that 
for 

Aomax(/^,V^(2-^' (*+!))) <X<V^, 

we have 

(42) P^iV^WF^ - F\\oo > A) < ci exp{-C2A2}. 

We next give the wavelet- analogue of Donsker's classical functional CLT for 
the empirical distribution function. 

Theorem 4. Let (p^ijj andpQ satisfy the conditions of Lemma 4 for some 
t>0 and let jn satisfy 2"-'" > d{logn)/n for all n and y/n2^^"^^^^^ —>■ as 
n^oo. If F is the distribution function ofP, then 

\/n{F^ - F) -^^oo(iR) Gp. 
For the compact law of the iterated logarithm define 

5=|x^ £^fdP- J fdP = 0,J f^dP<iy 
the Strassen set. 



Theorem 5. Let (j), ip andpQ satisfy the conditions of Lemma 4 for some 
t>0 and let jn satisfy 2~^" > d(logn)/n for all n and sup„ \/n(2~-'")*+^ = 
M < oo. Let F be the distribution function of P. Then, almost surely, the 
sequence 

{\ oo 

is relatively compact in i°°{M.) and its set of limit points coincides with the 
Strassen set S. 
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Finally, we consider the smallest admissible choice of A in Lemma 4. In the 
case t = and the largest admissible resolution level 2^" ~ n/logn, we see 
that we can take A ~ logn/ y/n, the rate occurring in the Komlos, Major and 
Tusnady (1975), result on strong approximation of ^^(Fn — F) by Brownian 
bridges. Consequently, we have the following strong invariance principle for 
the integrated wavelet density estimator F^ . 

Theorem 6. Let (pjip and pq satisfy the conditions of Lemma 4 with 
t = 0, and set 2"-'" ~ (logn)/n. Let F be the distribution function of P. Then 
there exists a probability space that supports Xi,X2, ... i.i.d. with density pQ 
and a sequence of Brownian bridges such that, for all n and x gM, 

Pr(|| V^(F„^ -F)-Br,oF\\^> n- V2((c + A'^) log n + x)) < 2n-^ + Me""^ , 

where C, M, rj are absolute constants and where Aq = max(2L, \/2L, Aq), with 
Aq and L as in Lemma 4- In particular, for these versions, one has 

||^(Fr-F)-5„oF|L = 0,,,.(i^). 

5.2. General UCLTs for wavelet density estimators. The question arises 
whether {l(_oo,s] : s G ]R} in the last section can be replaced by a more gen- 
eral Donsker class J-. Considering the central limit theorem, such results 
were proved for other density estimators — such as nonparametric maximum 
likelihood estimators and kernel density estimators — in Nickl (2007) and 
Gine and Nickl (2008). We show in this section that such results can also be 
proved for the wavelet estimator , for many classes in particular for 
balls in general Besov spaces (hence covering Sobolev, Holder and Lipschitz 
classes) . 

In the case of general (Besov) classes of functions, the wavelet structure 
will be particularly helpful, but before we turn to these classes, we show that 
Lemma 4 immediately implies the following result for bounded variation 
classes, since these are in the closed convex hull of indicator functions. A 
measurable function / : M i— > M is of bounded variation if fi(/) < oo, cf. (13), 
and the class !F = {f : \\f\\oo + vi{f) < 1} is a P-Donsker class for every P 
[see, e.g., Dudley (1992)]. 

Corollary 3. Let cj),tjj and po, satisfy the conditions of Lemma 4 
for some t>0. Then, if J^r = {/ right continuous: \\f\\oo + vi{f) < 1} and 
-L, Aq, A,j are as in Lemma 4, we have for all n G N, 

Pr(^||F- - P,.y, > A) < Lexp{- """'^'f ^^' }. 
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If furthermore A/n2 ^ as n ^ oo and if T = {/ : ||/||oo < 1}; 

then also 

^(P^-P) ^,^(^) Gp. 

Proof. If / is of bounded variation and right continuous, and /(— oo+) = 
0, then there exists a unique finite Borel measure /x/ such that f{x) = 
I l(-oo,a:](^^) dyif{v). Since {P^ — Pn)c = for c constant [see (7)], we may as- 
sume that the elements in all satisfy /(— oo+) = 0. We then have from Fu- 
bini's theorem [using also (6)], for / G J^j? that \ {P^ -Pn)f\ < - F„||oo. 
This already proves the first claim of the corollary by Lemma 4. To prove 
the second claim, observe that any / £ is right-continuous except at most 
at a countable number of points, in particular there exists a right-continuous 
function / such that / = / almost everywhere. Since P^ ,P are absolutely 
continuous measures, we have 

- P)f = V^iP^ - P)f = V^iP^ - Pn)f+ V^iPn - P)f, 

which proves the second claim by using the first and since is P-Donsker. 

□ 

We will now prove a general central limit theorem for the wavelet density 
estimator, uniformly over Besov balls. The proof via the decomposition (38) 
necessitates that these balls be Donsker classes of functions. The following 
Donsker property of balls in Bp^iM) was proved in Nickl and Potscher (2007), 
and can be shown to be essentially sharp [see Nickl (2006)]. Note that under 
the following conditions on s,p, q, the Besov spaces Bp^iR) can (and will be) 
viewed as spaces of bounded continuous functions. 

Proposition 4. Let T he a hounded suhset of B^^iM) where 1 < p < oo, 
1 <q< oo, and let P he a probahility measure on M. Suppose that one of the 
following conditions holds: 

(a) 1 < p < 2 and s > 1 /p. 

(b) 2 <p < oo, s> 1/2, and \x\'^"'dP{x) < oo for some 7 > 1/2 - 1/p. 

(c) 1 < p < 2, q = I and s = 1/p. 
Then T is P-Donsker. 

Theorem 7. Let I < p,q < 00 and 1/p + l/r = 1. Let dP^ [x) =pn{x) dx 
where pn is as in (11) and where (j), ip satisfy part (i) of Condition -Z(T) for 
some 1 < T < 00. For < s < T + 1 and P, s,p, q satisfying one of the con- 
ditions in Proposition 4, let T he a hounded suhset of Bpg{M.). Assume in 
addition that po G £''(M) — in which case we set t = — or that po G ^^^^(M) 
for some t, 0<t<T + l. Suppose ^/n2~^"^^^^^ — > as 00. Then 

MP^-P) Gp. 
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Proof. We shall use throughout the proof the properties of Besov 
spaces summarized in Remark 1. Note first that under the conditions of 
the theorem, if p = 1, then s>lors = g = l,in which case J- is a, bounded 
subset of BV{M), and the conclusion of the theorem follows from Corollary 
3. So we need only consider the case p> 1. 

We will use the decomposition (38) from above, and we first deal with the 
expectation term. As in (39), we obtain 



J {p{K,)-p)f=Y,Y.i^ik{po)M) 



l>j k 

where one uses the conjugacy of p and r and the fact that the wavelet series of 
Pq S £^(R) converges in C^{M) if 1 < r < oo. If t > 0, we obtain from [Hardle 
et al. (1998), Theorem 9.4] that ||A( )(Po)||r < c2~'(*+i/2-i/0 for some finite 
constant c. In case t = this follows from (6) and a computation similar 
to the one in (40), using Holder's inequality. Similarly, it follows from the 
same reference, noting the obvious imbedding of Bpg{R.) into Bp^{R.), that 
we have 

(43) sup||A(.)(/)||p<c'2-'('^+V2-i/p). 
fey" 

Hence the second "expectation" term in (38) is of order 
\KjM-Po)f 



sup 



<sup^||A(.)(po)||J|A(.)(/)llp 

< ^ ^'>2~lit+s+l~l/r~l/p) ^ ^'>'2-jn{t+s) 

by the assumption on j„. 

It remains to treat the first term in (38). First observe that the class of 
functions 

\Jjr>:=[j{K,{f)-f:feT} 

is P-Donsker: by definition of the Besov norm and (10), we see that for 
s' such that max(l/2,l/p) < s' < min(s,l), ||-ft'j(/)||s',p,g is bounded from 
above by ||/||s',p,q < c||/||s,p,q, uniformly in j. Consequently, Uj>o-^j is con- 
tained in a ball of Bp^iW) of radius at most c'sup^^g^ ||/||s,p,(j for some con- 
stant < c' < oo, hence it is P-Donsker by Proposition 4. So, in order to 
prove 

\\Pn-Py' =Op(l/^/^), 

Jn 
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it suffices to sliow tliat the variances sup j^-f Ef'^(X) converge to zero. 
Since bounded subsets of Bp^{M.) are uniformly laounded classes of functions 
under the conditions of the theorem, we have 

E{K^{f){X)-f{X)f<cj \K^{f){x)-f{x)\p^{x)dx 

(44) 

<c\\K,{f)-fUpo\\r 
and this completes the proof since pq G by assumption and since 

sup \\KjJf) - /lip < C'sup ||i^j„(/) - f\\s',p,q 

/ oo \ 1/9 

= supK:(2'(^'+V2"V.)||^ )9 ^0 

as n ^ 00, by (43). □ 

Gine and Nickl [(2008), Theorems 5-7, Lemma 12] proved an analogue 
of Theorem 7 and of Corollary 3 for the classical kernel density estimator. 
At first sight the proof there seems somewhat more involved, but it should 
be noted that the proof in the wavelet case relies on nontrivial results such 
as the wavelet characterization of Besov spaces together with the Donsker 
property of Besov balls (Proposition 4), which cannot be used in the case of 
convolution kernels. We should also mention that the case p > 2 (and com- 
pactly supported po) in the above theorem was considered in Nickl (2007) 
for the much more involved case of nonparametric maximum likelihood es- 
timators. 



6. Adaptation in sup-norm loss and the "plug-in property" of thresh- 
olding wavelet estimators. The linear wavelet estimator Pn{y) from (11) 
requires choosing j„, and the choice of j„ that produces optimal results for 
Pn depends on the smoothness t of the true density pq (cf. the discussion in 
Remark 8). Prom a practical point of view, this is a drawback, as po is un- 
known. A remedy for this problem was suggested in Donoho et al. (1996) by 
considering so called "thresholding" wavelet estimators, defined as follows. 
Note first that we may write, for j'o < ji, both integers, 

ii-i ii-i 

Pn{Kj,) = Pn[K.j,) + Pn{Ki+i - Ki) = Pn{K.j,) + ^ ^ AfcV^^fc. 

l=jo l=jo k 



Hard thresholding (the only one we will consider) consists of keeping in this 
sum only those f3ik that are larger than a threshold r. That is, for ji = ji{n) 
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and r = T{n,l) the hard thresholding estimator of po is given by 
(45) p^{y) = Pn{K,,{yr))+Y. E M|&|>r}^^fc(y)- 

l=jo k 

It is known [e.g., Donoho et al. (1996), Juditsky and Lambert-Lacroix (2004)] 
that if r, jo, ji are chosen in a suitable way (not requiring the knowledge of 
the smoothness parameter t), then p^ is rate-adaptive within a logarithmic 
factor for any Lpi loss, 1 < < oo, that is 

sup Ep^Wp^ -p^f^, < C(logn)^r„(t,p'), 

PO - \\po\\t.p,q<L,\supp{po)\<M 

where 7 > 0, C is a constant and rn{t,p') is the minimax rate of convergence 
for estimating a density in the given Besov ball. 

We now show, without assuming compact support for po, that the thesh- 
olding wavelet estimator is rate adaptive for supnorm loss without the log- 
arithmic penalty and that, simultaneously, its distribution function is ^/n- 
consistent in the sup norm (in fact, it satisfies the UCLT). The pattern of 
proof of the result below follows that of the aforementioned authors, but 
we must use the results from the previous sections in several instances, and 
we must deal with the unbounded support of po by introducing a moment 
condition for po of arbitrarily small order combined with an application of 
Hoffmann-j0rgensen's inequality. 

For K> 0, define the constant 

^2 

c(k) :=c(k,-0, IIpoIIoo) = ,||2|| n 1 0//Q /I o\ 11/11 • 

SIIV'EIIpoIIoo + 8/(3Vlog2)K||V'||oo 

Also, define 

V{L,L',r]) = |po: ||po|koo,oo < L, J \x\'^po{x)dx < L'j. 



Theorem 8. Suppose (p satisfies Condition 2 and (j), ip are such that 
Condition 1{T) holds for some < T < 00. Assume further that the density 
Po of P satisfies po G B^^{M.) for some t, < t <T + 1, and that < 
00 for some rj > 0. Let p^ , n > 2, be the thresholding estimator in (45) 
corresponding to 

T = T{n, I) = K\Jl/n, 
2^0 ~ (n/logn)^/(^(^+^)+^) and n/log?i < 2^'i < 2n/logn, jo<Ji, 
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where K > is chosen so that c{k) > (T + 3) {1 + T] ^)log2. Then 

(46) sup Ep,snv\pi!{y)-po(.y)\=o((^) 

po€P{L,L',r]) ym W n / / 

Moreover, letting F,^ and F denote the distribution functions ofp^ andpo, 
respectively, 

(47) V^{F^ - F) ^,oo«) Gp. 



Proof. Since 



ii-i 



Po = Kj^ (po) + {Ki+i -Ki){po)- Kj^ (po) + po 
l=jo 

and since 

(Ki+i - mpo) = Y.(^ik{po)i^ik 

l=jo 1=30 k 

with the last series converging pointwise (in fact uniformly) because pq £ 
£i(M), we have, 

llpf -po||oo<||(Pn-P)(i^,o)lloo 

+ II(/^^fc^{|A,|>r} - l3lk{P0))^lk 

l=jo k 

+ ll^ji(Po) Po 1 1 oo • 
The expectation of the first term is 

0(((logn)/n)(^+i)/(2(T+i)+i))^^(((l^g^^/^)t/(2t+i)^ 

by (17) and since t < T + 1. The third term is at most of the order 2"-'^* by 
(37), and this is 0((logn/n)*) = o((logn/n)*/(2t+i)). 

It remains to consider the second term, which can be decomposed as 



l=jo k 



\(3ik\>r/2] +-^[|Afc|>T,|Afc|<T/2]] 



l=jo k 

:=(!) + (II) -(III) -(IV), 

where we write (3ik for Afc(Po)- We first treat the "large deviations" terms 
(II) and (III). 
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For (II) we choose q G (l,r/ + 1) such that 
(48) 



c(k) > -^-^ f-log2, 



a — 1 

which is possible by the condition on c(k), and note, 
ii-i 



£^sup 

X 



(49) 



X [Pr{|Afc|>r,|Afc|<r/2}]i-i/". 

Now, since sup^ |V'ifc(x)| < 2'/2||i/;||^ and ^V'ii(^) < ll-^llillPolloo = IIpoIU, 
Bernstein's inequaUty gives, for I < ji — 1 (and n > e^), 

Pr{|Afc|>r,|Afc|<r/2} 

1 

n 



<Pr 



(50) 



< 2exp 



< 2exp 



j=l 



^||po||oo + (8/3) 



2H/n 



8||po||oo + (8/(3VT^))k 



a bound which is independent of k. In order to estimate Afc ~ Afcl"]^''") 

we note that, by Hoffmann- J 0rgensen's inequahty [see the version of Corol- 
lary 1.2.7 in de la Peha and Gine (1999)], there exists a universal constant 
d{a) such that 



(51) 



<d{a 



max 

l<i<n 



n 

If supp-f/i C [j4i, A2], we have, for the second summand, 

Eii/5/fc-/3/fciii,p<2'/'+lV'iiooE / 



+ IIAfc-Afe||i,p • 



(52) 



< 2'/2^ 



(A2+fc)/2' 



^ J{Ai+fe)/2' 
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<2(A2-^i + l)||V||oo2'/2 

(since, for / fixed, each x G M is contained in at most A2 — Ai + 1 intervals 
[{Ai + k)/2\ {A2 + k)/2^]). In order to estimate the sum over k of the first 
summands in (51), we first observe that for each k and I, they are bounded 
by 



2 nE 



-^ik{X) 



n 



a\l/a 



<„-("-i)/«2('/2)+i 



dP 



1/a 



Let Ki = {k:0£ [{Ai + k)/2\ {A2 + k)/2'']}, which consists of at most A2 
Ai + l terms and set K2 = Z \ Ki . Then, 

E if ^ip)'^"<(^2-^i + l)("+')/"2-'/°||po||J/" 



and 



5( 



dP 



1/a 



(1 + {\Ai + k\A \A2 + fc|)/2')''/" VVi+fe)/2' 



(A2+fc)/2' \ 1/a 

{l + \x\YidP 



(2' + (|^i+fc| A|A2 + fc|))'?/("-i) 

(A2+fc)/2' \ 1/q 

■l + |x|)^(iP(x) 



by H51der. Since for A > 1, EkeK, (2'+(|Ai+4A|A2+fc|))^ ^ ^ 

1 \ 1-1/" 



°°2! we get 



2iri/a 

(2' + (1^1 + A;| A \A2 + 



for a constant C = C^^o depending only on ?] and a. Moreover, 



E 



(yl2+fe)/2' 



(l + |rE|)''dP(x) 



1/a 



^J(Ai+k)/2' 

< {A2 -Ai + l)V°(E(i + l^l"))^/" < 00. 
Thus, collecting terms. 



E 



max 

l<j<n. 



n 



(53) 



< Crr^"~^^/°'2^/'^{2~^/°' + 2'^"""*"^/"), 
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where C depends on ct, 7], IIV'lloo) -^ii ^2 snd ||po||cxd- Now, adding (52) 
and (53) gives a bound for J2k IIAfc ~ Pik\\a,p by (51), which, combined with 
inequahty (50), and (49), proves that the series in (49) is dominated by 



ii-i 



(54) 



C 2'[l + 2(n-^2')("-i)/"]e-^('')'("-^)/", 



=J0 



where C depends only on a, rj, ||^||oo, Ai, A2 and ||po||oo- By definition of 
ji, n~^2^ < 2/logn for I < ji, which gives 

2'[1 + 2(n-^2')("-i)/"] < 2'(1 + 2(2/logn)("-i)/") < c2' 

for some c < 00, and, using the definition of a and condition (48) for c(k), 
we obtain that (54) is bounded by 

C" 2"'(^+^) < C"'2"-^'°(^+^) 

l=jo 

for suitable constants C" and C" . By the definition of jo and T, we see that 
this gives the bound 

'logn\^^+^^/^'^^^+'^^+'^h f f\ogn\ */(2*+i) ^ 



O 



n 



n 



for the series in (49), which is what we wanted to prove for term (II). 
For term (III), 

ii-i 

EEAfeV'Zfe(^)/[|A,|<r,|A,|>2.] 



E'sup 

X 



1=30 k 



l=jo k 

<c'^'2'e-^W'<c'2-^o(^+i) = 



logn 



n 



t/(2t+l) 



where we have used that (40) and ||Ki(po)||i < W^i *^'o||i ^ W^Wi [by (6)] 
imply Efc lAfcl < C2^'^. and that Pr{|Afc| < r, > 2t} < Pr{|Afc - /3/fc| > 
r} < 2exp{— c(k)/} by (50) and choice of k. 

Next, we consider (I). We will use (18) and we should note in advance that 
if l<ji, then yT/n > C2^/H/n, so that yjljn is the dominating term in that 
bound. Let ji(t) be such that jo < Ji(i) < ii — 1 and 2-'i(*) ~ (n/logn)^/(2*+-^) 
[such ji(t) exists by the definitions]. Using (18) and (6) we have 

E -/5'fc)^'fc(^)^[|ft,|>r,|A,|>r/2] 
'=J0 k 



E'sup 

X 
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supV \ip{2'-x - k)\ 



< 



2n 



c y \ — = o 

V n 



=J0 



n 



O 



logn 



n 



t/(2t+i; 



For the second part of (I), using the same facts as in the previous com- 
putation and that sup^, |/3ifc(po)| < c2~^(*+^/^) by (37), we have [recall the 
definition of r = t{1, n)] 



E'sup 

X 



l=h(t) k 



2 I — 

< E ^(sup|Afc-Afc|)-\/7Sup|Afc|2'/2supElV'(2'x- 
log n \ ^ 



<C E 2 



t/(2t+l) 



n 



(55) 



Finally, for term (IV), using (6) and (37) we have 
ii-i 

sup EE/5/fcV'/fc(^)/[|A,l<r,lft,|<2r] 
<cEsup2'/2|A,|Ip,^|<2.] 

l=jo ^ 

<cE min(2'/^5,C2-'*), 
'=io 

where 6 = 2K,y/ ji/n > 2r and where C only depends on the Besov norm L 
of pq. To estimate this quantity, we use an idea of Donoho et al. [(1997), 
proof of Theorem 3, see also Delyon and Juditsky (1996)]. Set W{1) = 
min(2'/2j,C'2-'*). Clearly supj^^<i^j^_iW(l) is attained at /* such that 2'* = 
(C7/<5)V(t+i/2)^ and W (l*) = C^'"' 5'' = 02'^^* where r = t/(t + 1/2). Hence, 

W{l)/W{l*) <mm{2^^^'-^\2^'*+^/^6/C). 
So the last term in (55) equals 

ii-i 

cY,Wil)<cW{l*)2^''5C~^ E 2'/2 + cVF(r)E2*^'*"'^ 

l=jo jo<l<l' l>l* 
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< c'Ty(/*)2'*(*+V2)5C-i + c'Win = c"6' = 0^i^-^y^''^'^^ . 

This concludes the proof of (46). 

To prove (47), observe that, with Pn{ji)iy) ■= Pn{Kj^{y, •)), 

ii-i 

Pn -PO=PnUl) -PO-Yl Zl/^'fc^'fc-^[|Afc|<r]' 

l=jo k 

hence the result follows from Theorem 4 since, with ^={1(_oo,s]:sEM}, 



sup 



ii-i 

l=jo k 

<supJ2 K^l/nY,\Pikif)\ 
/e-^ i=jf, k 

<4E2-'/^^ = o(1/V^), 

where we use (41). □ 

Remark 10 (Choice of k). In order to choose k so that the constant c(k) 
satisfies the lower bound in the theorem, one needs to choose the mother 
wavelet ip and know a uniform bound on ||po||oo- For example, if one takes 
the Haar basis (and hence T = 0), then \\ip\\2 = llV'lloo = 1> and if one knows 
in addition that ||po||oo ^ 1 and that a moment of po of order one or larger 
exists, then the choice k = 16 is admissible and one can adapt to the smooth- 
ness of Po up to deg ree t < 1. If no bound on ||po||oo is available, one may 
replace ||po||oo by ||pn||ooi where pn is chosen with 2^" ~ ?i/(logn)^. One can 
then adapt arguments from Gine and Nickl (2009) to show that Theorem 8 
still holds true for this (random) choice of k. 



Remark 11 (Adaptation in the sup-norm). Aadaptive estimation of a 
density in sup-norm loss was considered in Tsybakov (1998) and Golubev, 
Lespki and Levit (2001), who worked within the framework of the Gaussian 
white noise model, and adapted over Sobolev balls. Considering the den- 
sity model on the real line and adaptation over the (in this context) more 
natural classes B^^{M), Cine and Nickl (2009) constructed an estimator us- 
ing Lepski's method that has the same properties as the hard thresholding 
estimator from Theorem 8 above. 
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APPENDIX: TALAGRAND'S INEQUALITY AND MOMENT 
BOUNDS FOR VC CLASSES 

Let Xi, . . . ,Xn be i.i.d. with law P on M, and let ^ be a P-centered 
(i.e., / f dP = for all f £ J^) countable class of real- valued functions on R, 
uniformly bounded by the constant U. Let a be any positive number such 
that > supf^^plfiX)), set E := E\\J27=if{Xi)y and set V := 
E\\ Er=i < na"^ + 16UE [see Talagrand (1994) for the inequality]. 

Then there exists a universal constant L such that, for every t > 0, 



(56) Pr< max 

k<n 



1=1 



>E + t}< Lexp 



It / tU 
log IH 



This is Talagrand's (1996) inequality, which is usually stated for || ^27=1 /(^i)l 
instead of for the maximum of the partial sums. However, it follows in 
the stated form because Talagrand's inequality can be proved [e.g., Ledoux 
(2001), page 144ff] by estimation of the Laplace transform of || J27=i fi-^i)\\j^^ 
and exp{A|| J2i=i f{Xi)\\y^}^ A: = 1, 2, . . . , is a submartingale, so that Doob's 
inequality can be applied [see also Einmahl and Mason (2000, 2005)]. We say 
that is a VC-type class for the envelope U and with VC-characteristics 
A,v if its C^{Q) covering numbers satisfy that, for all probability measures 
Q and e > 0, N , C"^ (Q) , e) < {AU/ey. For such classes, assuming Pf = 
for f £ J^, there exists a universal constant L' such that 



(57) 



E 



i=l 



.( ^ , — - I AU AU 
< L I V V ncr^ W log \-vU log 



[see, e.g., Gine and Guillou (2001)]. If cr < C//2 we may replace ^4 by 1 at 
the price of changing the constant L' . Then, if 



(58) 



na^ > C log 



U 



a 



for some constant C we obtain 



(59) 



E 



i=l 



u 



< L"V ncr^y log — and V < L"'na^ 



for constants L", L'" that depend only on A, v, C. Combining these estimates 
with Talagrand's inequality (56), it is easy to obtain [as in Corollary 2.2 in 
Gine and Guillou (2002)] that there exist constants R and Ci depending 
only on A and v such that for all C2 > Ci, if 



Ci V^aJ log- <t<C2 — 



a < U/2, 
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and (58) are satisfied, then 



(60) 



:< I 



Pr< max 

I k<n 



i=l 
II' \ 



>t)< Rexp 



wliere C3 = log(l + C2/L"')/i?C2. In particular, for n > Ci, with L = L'" Vi?, 

[ ^ 



Pr< max 

I k<n 



i=l 



U 



> uVno^\ / loe — > < L 



a 



exp 



ulog(l + n/L), [/ 

= log — 

L a 



These tail probabihties are of Poisson-type, and an easy (but somewhat 
cumbersome) computation yields that, for all A > 0, 



p /, iiEii/(^^)ii^ 

exp < A max , — , 

I k<n y/^^log{U/a) 



(61) 



< D{A,v,Ci,L){l + ^ \L{e^^L/iog{u/a) _ 1) 

xexp{AL(e2^^/'°s(^/'^)-l)}). 
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